\section{Key ideas}
The idea is to use simple texts and images from children's books and
let the program teach itself a connection between keywords
representing visual objects and images of these objects. So, after the
program has ''read'' a book containing images of a ball and related
text with the word \emph{ball}, it should be able to correctly label
the images of a ball as \emph{ball} (see figure \ref{duy}).

To do this, we need four main components:

\begin{itemize}
\item \textbf{Image-recognition} A component that given a new image
  with an object, compares it with already known objects and deduces
  which object it is. Or classify it as an unknown object. This part
  will be able to operate without first being trained. Thus it will be
  capable of unsupervised learning.  (We could maybe use some part of
  open-cv or Vison Toolkit developed by Numenta for this.)

\item \textbf{Natural language processing} We will need a component
  that can read simple text and extract words or sentences that are
  likely to represent objects that appear in the images. For this
  we're planning to use the \textit{Python Natural Language
    Toolkit}\cite{nltk}.

\item \textbf{Reasoning system} A component that given the objects in
  one image, and the words representing objects, compares them with
  material from other pages in the book and deduce which words that
  (probably) should be combined with which objects. (Bayesian
  Networks/Trees?)

\item \textbf{Object-detection} A component that takes an image as
  input and outputs images containing the objects shown in the
  original image. As an option, we could do this part manually. (We
  could maybe use some part of open-cv for this). \emph{We may not
    have time to fully complete this part of the project; we are
    prepared that we may have to do it manually by associating small
    images of individual objects to each page, instead of one big
    image.}

\end{itemize}

So in the end, reading one book should result in a collection of
sample images representing objects in the book and name tag for each
object. Example:

\begin{figure}[H]
    \begin{center}
    \includegraphics[width=0.75\textwidth]{CalvinHobbes.jpg}
    \caption{Page 1: This is Calvin and Hobbes.}
    \end{center}
\end{figure}

Analyzing this, we get two objects (or possibly some more) and two
words \emph{Calvin} and \emph{Hobbes}.

\begin{figure}[H]
    \begin{center}
    \includegraphics[width=0.75\textwidth]{calvin.jpg}
    \caption{Page 2: Calvin has a ball.}
    \end{center}
\end{figure}

Now, given page 1 and 2, we could deduce what object is Calvin, what
object is the ball and what object is Hobbes.

